All Questions
5 questions
1vote
1answer
87views
Deep RL problem: Loss decreases but agent doesn't learn
I'm implementing a basic Vanilla Policy Gradient algorithm for the CartPole-v1 gymnasium environment, and I don't know what I'm doing wrong. No matter what I try, during the training loop the loss ...
1vote
1answer
603views
What is the problem in my implementation of actor critic?
I have been implementing both REINFORCE with baseline and actor-critic to solve "cartpole-v1". As a reminder, here is the presentation of the algorithms in Sutton and Barto's book (http://...
1vote
1answer
2kviews
DDPG doesn't converge for MountainCarContinuous-v0 gym environment
I am trying to implement Deep Deterministic policy gradient algorithm by referring to the paper Continuous Control using Deep Reinforcement Learning on the MountainCarContinuous-v0 gym environment. I ...
1vote
0answers
391views
Subtracting the entropy from our policy gradient will prevent our agent from being stuck in the local minimum?
In the information theory, the entropy is a measure of uncertainty in some system. Being applied to agent policy, entropy shows how much the agent is uncertain about which action to make. In math ...
3votes
0answers
44views
Reinforcement Learning on quantum circuit
I am trying to teach an agent to make any random 1-qubit state reach uniform superposition. So basically, the full circuit will be ...